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Abstract 

For a number of problems in the theory of online algorithms, it is known that the assumption 
that elements arrive in uniformly random order enables the design of algorithms with much 
better performance guarantees than under worst-case assumptions. The quintessential example 
of this phenomenon is the secretary problem, in which an algorithm attempts to stop a sequence 
at the moment it observes the maximum value in the sequence. As is well known, if the sequence 
is presented in uniformly random order there is an algorithm that succeeds with probability 1/e, 
whereas no non-trivial performance guarantee is possible if the elements arrive in worst-case 
order. 

In many of the applications of online algorithms, it is reasonable to assume there is some 
randomness in the input sequence, but unreasonable to assume that the arrival ordering is uni¬ 
formly random. This work initiates an investigation into relaxations of the random-ordering 
hypothesis in online algorithms, by focusing on the secretary problem and asking what perfor¬ 
mance guarantees one can prove under relaxed assumptions. Toward this end, we present two 
sets of properties of distributions over permutations as sufficient conditions, called the {p,q,S)- 
block-independence property and {k,6)-uniform-induced-ordering property. We show these two 
are asymptotically equivalent by borrowing some techniques from the celebrated approximation 
theory. Moreover, we show they both imply the existence of secretary algorithms with constant 
probability of correct selection, approaching the optimal constant 1/e as the related parameters 
of the property tend towards their extreme values. Both of these properties are significantly 
weaker than the usual assumption of uniform randomness; we substantiate this by providing 
several constructions of distributions that satisfy (p, g, (5)-block-independence. As one applica¬ 
tion of our investigation, we prove that 0 (log log n) is the minimum entropy of any permutation 
distribution that permits constant probability of correct selection in the secretary problem with 
n elements. While our block-independence condition is sufficient for constant probability of 
correct selection, it is not necessary; however, we present complexity-theoretic evidence that 
no simple necessary and sufficient criterion exists. Finally, we explore the extent to which the 
performance guarantees of other algorithms are preserved when one relaxes the uniform random 
ordering assumption to (p, q, ^)-block-independence, obtaining a positive result for Kleinberg’s 
multiple-choice secretary algorithm and a negative result for the weighted bipartite matching 
algorithm of Korula and Pal. 
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1 Introduction 


A recurring theme in the theory of online algorithms is that algorithms may perform much bet¬ 
ter when their input is in (uniformly) random order than when the ordering is worst-case. The 
quientessential example of this phenomenon is the secretary problem, in which an algorithm at¬ 
tempts to stop a sequence at the moment it observes the maximum value in the sequence. As is well 
known, if the sequence is presented in uniformly random order there is an algorithm that succeeds 
with probability whereas no non-trivial performance guarantee is possible if the elements arrive 
in worst-case order. 

In many of the applications of online algorithms, it is reasonable to assume there is some 
randomness in the input sequence, but unreasonable to assume that the input ordering is uniformly 
random. It is therefore of interest to ask which algorithms have robust performance guarantees, 
in the sense that the performance guarantee holds not only when the input order is drawn from 
the uniform distribution, but whenever the input order is drawn from a reasonably broad family 
of distributions that includes the uniform one. In other words, we seek relaxations of the standard 
random-ordering hypothesis which are weak enough to include many distributions of interest, but 
strong enough to enable one to prove the same (or qualitatively similar) performance guarantees 
for online algorithms. 

This work initiates an investigation into relaxations of the random-ordering hypothesis in online 
algorithms, by focusing on the secretary problem and asking what performance guarantees one 
can prove under relaxed assumptions. In the problems we consider there are three parties: an 
adversary that assigns values to items, nature which permutes the items into a random order, and 
an algorithm that observes the items and their values in the order specified by nature. To state 
our results, let us say that a distribution over permutations, is secretary-admissible (abbreviated 
s-admissible) if it is the case that when nature uses this distribution to sample the ordering of 
items, there exists an algorithm that is guaranteed at least a constant probability of selecting 
the element of maximum value, no matter what values the adversary assigns to elements. If this 
constant probability approaches ^ as the number of elements, n, goes to infinity, we say that the 
distribution is secretary-optimal (s-optimal). 

Question 1: What natural properties of a distribution suffice to guarantee that it is 

s-admissible? What properties suffice to guarantee that it is s-optimal? 

For example, rather than assuming that ordering of the entire n-tuple of items is uniformly random, 
suppose we fix a constant k and assume that for every A;-tuple of distinct items, the relative order in 
which they appear in the input sequence is (5-close to uniform. Does this imply that the distribution 
is s-admissible? In ^we formalize this {k, 5)-uniform-induced-ordering property (UIOP), and we 
prove that it implies s-admissibility for k > 3 and approaches s-optimality as k ^ oo and (5 —)• 0. 
To prove this, we relate the uniform-induced-ordering property to another property, the {p,q,6)- 
block-independence property (BIP), which may be of independent interest. Roughly speaking, the 
block-independence property asserts that the joint distribution of arrival times of any p distinct 
elements, when considered at coarse enough granularity, is (5-close to p i.i.d. samples from the 
uniform distribution. While this property may sound much stronger than the UIOP, we show that 
it is actually implied by the UIOP for sufficiently large k and small 6. 

To substantiate the notion that these properties are satished by many interesting distributions 
that are far from uniform, we show that they apply to several natural families of permutation distri¬ 
butions, including almost every uniform distribution with support size a; (log n), and the distribution 
over linear orderings defined by taking any n sufficiently “incoherent” vectors and projecting them 
onto a random line. 
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A distinct but related topic in the theory of computing is pseudorandomness, which shares a 
similar emphasis on showing that performance guarantees of certain classes of algorithms are pre¬ 
served when one replaces the uniform distribution over inputs with suitably chosen non-uniform 
distributions, specifically those having low entropy. While our interest in s-admissibility and the 
{k, 5)-UIOP is primarily motivated by the considerations of robustness articulated earlier, the anal¬ 
ogy with pseudorandomness prompts a natural set of questions. 

Question 2: What is the minimum entropy of an s-admissible distribution? What is 
the minimum entropy of a distribution that satisfies the {k,6)-UIOP? Is there an explieit 
construction that achieves the minimum entropy? 

In ^and ^we supply matching upper and lower bounds to answer the first two questions. The an¬ 
swer is the same in both cases, and it is surprisingly small: ©(loglogn) bits. Moreover, ©(log log n) 
bits suffice not just for s-admissibility, but for s-optimality! We also supply an explicit construction, 
using Reed-Solomon codes, of distributions with ©(loglogn) bits of entropy that satisfy all of these 
properties. 

Given that the (k, 5)-UI©P is a sufficient condition for s-admissibility, that it is satisfied in 
every natural construction of s-admissible distributions that we know of, and that the minimum 
entropy of {k, (I)-UI©P distributions matches the minimum entropy of s-admissible distributions, it 
is tempting to hypothesize that the {k, (i)-UIOP (or something very similar) is both necessary and 
sufficient for s-admissibility. 

Question 3: Find a natural neeessary and sufficient condition that characterizes the 
property of s-admissibility. 

In 0we show that, unfortunately, this is probably impossible. We construct a strange distribution 
over input orderings that is s-admissible, but any algorithm achieving constant probability of correct 
selection must use a stopping rule that cannot be computed by circuits of size . The 

construction makes use of a coding-theoretic construction that may be of independent interest: a 
binary error-correcting code of block length n and message length m = o(n), such that if one erases 
any n — 2m symbols of the received vector, most messages can still be uniquely decoded even if 
Il(m) of the remaining 2m symbols are adversarially corrupted. 

Finally, we broaden our scope and consider other online problems with randomly-ordered inputs. 

Question 4: Are the performance guarantees of other online algorithms in the uniform- 
random-order model (approximately) preserved when one relaxes the assumption about 
the input order to the {k,6)-UIOP or the {p,q,6)-BIP? If the performance guarantee is 
not always preserved in general, what additional properties of an algorithm suffice to 
ensure that its performanee guarantee is preserved? 


This is an open-ended question, but we take some initial steps toward answering it by looking 
at two generalizations of the secretary problem: the multiple-choice secretary problem (a.k.a. the 
uniform matroid secretary problem) and the online bipartite weighted matching problem. We show 
that the algorithm of Kleinberg 25| for the former problem preserves its performance guarantee, 
and the algorithm of Korula and Pal [1^ for the latter problem does not. 


Related Work. The secretary problem was solved by Lindley 281 and Dynkin 1^. A sequence 
of papers relating secretary problems to online mechanism design lio, 0, IH] touched off a flurry of 
CS research during the past 10 years. Much of this research has focused on the so-called matroid 
secretary problem, which remains unsolved despite a string of breakthroughs including a recent 
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pair of 0(log log r)-competitive algorithms 27|, where r is the matroid rank. Generalizations 
are known for weighted matchings in graphs and hypergraphs l3. [j^. 26], independent sets 19|. 
knapsack constraints Qj, and submodular payoff functions [3, 1^, among others. Of particular 
relevance to our work is the free order model 2ll| ; our results on the minimum entropy s-admissible 
distribution can be regarded as a randomness-efficient secretary algorithm in the free-order model. 

The uniform-random-ordering hypothesis has been applied to many other problems in online 
algorithms, perhaps most visibly to the AdWords problei n fl^ . [l^ and its generalizations to online 
linear programming with packing constraints [3, 13, 24. l32l|. and online convex program ming [l|| . 
Applications of the random-order hypothesis in minimization settings are more rare; see 29j, l30(| 
for applications in the context of facility location and network design. 

In seeking a middle ground between worst-case and average-case analysis, our work contributes 


to a broad-based research program going by the name of “beyond worst-case analysis” 3^ . In terms 


of motivation, there are clear conceptual parallels between our paper and the work of Mitzenmacher 
and Vadhan [^, who study hashing and identify hypotheses on the data-generating process, much 
weaker than uniform randomness, under which random hashing using a 2-universal hash family has 
provably good performance, although at a technical level our paper bears no relation to theirs. 

The properties of permutation distributions that we identify in our work bear a resemblance 
to almost /c-wise independent permutations (e.g., [^), but the (/c,5)-UIOP and (p, g,(5)-BIP are 
much weaker, and consequently permutation distributions satisfying these properties are much more 
prevalent than almost fe-wise independent permutations. 


Setting and Notations. We consider problems in which an algorithm selects one or more ele¬ 
ments from a set U ol n items. Items are presented sequentially, and an algorithm may only select 
items at the time when they are presented. In the secretary problem the items are totally ordered 
by value, and the algorithm is allowed to select only one element of the input sequence, with the 
goal of selecting the item of maximum value. Algorithms for the secretary problem are assumed to 
be comparison-basec0, meaning their decision whether to select the item presented at time t must 
be based only on the relative ordering (by value) of the first t elements that arrived. Algorithms 
are evaluated according to their probability of correct selection, i.e., the probability of selecting the 
item of maximum value. 

We assume that the set lA of items is [n] = {1,..., n}. The order in which items are presented 
is then represented by a permutation vr of [n], where 7r(i) denotes the position of item i in the input 
sequence. Similarly, the ordering of items by value can be represented by a permutation a of [n], 
where a{j) = i means that the largest item is i. Then, the input sequence observed by the 
algorithm is completely described by the composition ira. 


2 Sufficient Properties of Non-Uniform Probability Distributions 

In fJTJ we introduced two properties of non-uniform probability distributions which suffice to ensure 
existence of a secretary algorithm with constant probability of correct selection. (In other words, 
the two properties imply s-admissibility.) We begin by formally defining these two properties. 

Definition 1. A distribution vr over permutations of [n] satisfies the [k, (5)-uniform-induced-ordering 
property, abbreviated (fc, (I)-UIOP, if and only if, for every k distinct items xi,..., E [n], if tt is 
a random sample from tt then Pr [7r(xi) < 7r(x2) < • • • < 7r(xfc)] > (1 — (5)^. 

^This assumption of comparison-based algorithms is standard in the literature on secretary problems. Samuels 
[ 35 I proved that when the input order is uniformly random, it is impossible to achieve probability of correct selection 
1/e -I- e for any constant e > 0, even if the algorithm is allowed to observe the values. 
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The (/c, (5)-uniform-induced-ordering property is a very natural assumption and it is rather easy 
to show that it is fulfilled by a probability distribution. We will demonstrate this with a few 
examples in ^2.31 However, it is not clear how to analyze algorithms for secretary problems based 
on this property. To this end, the more technical (p, g, 5)-block-independence property is more 
helpful. We show this by analyzing the classic algorithm for the secretary problem in Section 12.11 
and the /c-uniform matroid secretary problem in Section [5l However, one of our main results 
in Section 12.21 is that these two properties are in fact equivalent, in the limit as the parameters 
k,p, q ^ oo and 5 — )• 0 . 

Definition 2. Given a positive integer q < n, partition [n] into q consecutive disjoint blocks of 
size between [n/q\ and [n/q] each, denoted by C [n]. A permutation distribution 

7£ satisfies the (p, q, (5)-block-independence property, abbreviated (p, q, (5)-BIP, if for any distinct 
xi,... ,Xp £ [n], and any bi ... ,bp £ [g] 


Pr 


ie[p] 


> (1 - 5 ) 



Note that bi ... ,bp do not necessarily have to be distinct. To simplify notation, given a permutation 
TT of [n], we define a function : U —)• [g] by setting 7r^(x) = i if and only if 7r(x) £ Bi for all 
X £U. 


2.1 Secretary Algorithms and the (p, g, (j)-block-independence property 


Next, we will analyze the standard threshold algorithm for the secretary problem under probability 
distributions that only fulfill the (p, q, <5)-block-independence property rather than being uniform. 
The algorithm only observes the first ^ items. Afterwards, it accepts the first item whose value 
exceeds all values seen up to this point. Under a uniform distribution, this algorithm picks the best 
items with probability at least ^ — o(l). We show that already for small constant values of p and q 
and rather large constant values of 5 this algorithm has constant success probability. At the same 
time, for large p and q and small 5, the probability converges to A 

Theorem 1. Under a {p,q, 6)-block-independent probability distribution, the standard secretary 
algorithm picks the best item with probability at least ^ — (5 — (l — 

Proof Sketch. Let T = [|J denote the index of the block in which the threshold is located. Fur¬ 
thermore, let Xj £ U he the jth best item. We condition on the event that xi comes in block 
with index i. To ensure that our algorithm picks this item, it suffices that X 2 comes in blocks 
1 ,...,T — 1. Alternatively, we also pick xi if the X 2 comes in blocks i -\- 1,... ,q and X 3 comes in 
blocks 1,... ,T — 1. Continuing this argument, we get 


Pr [correct selection] > [7r^(a;i) = i,7r^(x2), • • .,TT^{xj-i) > i,7r^{xj) < T] . 

i=T-|-l j=2 

Note that the (p, g, 5)-BIP implies the (p',g, 5)-BIP for any p' < p, simply by marginalizing over 
the remaining indices in the tuple. This gives us: 

IP I fq- iV~‘^T - 1 

Pr [correct selection] > 

i=T+lj=2 Q \ Q 


and the lemma follows after manipulating the expression on the right side and applying some 
standard bounds. □ 
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2.2 Relationship Between the Two Properties 

We will show that the two properties defined in the preceding section are in some sense equivalent 
in the limit as the parameters k,p,q ^ oo and h —)• 0. (For fc = 2, a distribution satisfying (k, 6)- 
UIOP is not even necessarily s-admissible—this is an easy consequence of the lower bound in ^ 
and the fact that the (2,0)-UIOP is achieved by a distribution with support size 2, that uniformly 
randomizes between a single permutation and its reverse. Already for k = 3 and any constant 
5< 1, the (/c, 5)-UIOP implies s-admissibility; this is shown in Appendix lAll 

Our first result is relatively straightforward: Any probability distribution that fulfills the 
(p, g, h)-BIP also fulfills the {p,S + ^)-UIOP. The (easy) proof is deferred to Appendix IB.1.21 

Theorem 2 . If a distribution over permutation fulfills the {p,q,6)-BIP, then it also fulfills the 
{p,6 + S^)-UIOP. 

The other direction is far less obvious. Observe that the {k, (5)-uniform-induced-ordering prop¬ 
erty works in a purely local sense: even for a single item x gU, the distribution of its position 7r{x) 
can be far from uniform. For example, the case k = 2 is even fulfilled by a two-point distribution 
that only include one permutation and its reverse. Then 7r(x) can only attain two different values. 
Nevertheless, we have the following result. 

Theorem 3. If a distribution over permutation fulfills the (k, 6)-uniform-indueed-ordering property, 
then it also satisfies {p, q, 6)-block-independenee property for p = ofik'^), q = 0{k5) as k goes to 
infinity. 

The proof applies the theory of approximation of funetions, which addresses the question of 
how well one can approximate arbitrary functions by polynomials. The main insight underlying 
the proof is the following. If 7£ satisfies the {k, (I)-UIOP, then for any /c-tuple of distinct elements 
xi,... ,Xk if one defines random variables Xi = 7r{xi)/n, then the expected value of any monomial 
of total degree /c/2 in the variables {X^} approximates the expected value of that same monomial 
under the distribution of a uniformly-random permutation. With this lemma in hand, proving 
Theorem [3] becomes a matter of quantifying how well the indicator function of a (multi-dimensional) 
rectangle can be approximated by low-degree polynomials. Approximation theory furnishes such 
estimates readily. To make the proof sketch concrete, we start by some definitions and notations 
from approximation theory; see, e.g., the textbook by Carothers [lOj]. 

Definition 3 ([1^). If / is any bounded function over [0,1], we define the sequence of Bernstein 
polynomials for / by 

(B,(/))(x) = fik/d)('l] xHl - x)"-", 0 < X < 1. (1) 

fc=o ^ 

Remark 1. Bd{f) is a polynomial of degree at most d. 

Definition 4 l[l3|l. The modulus of continuity of a bounded function / over [a, b] is defined by 

ujf{6) = sup{|/(xi) - fix 2 )\ : xi,X 2 G [a,b], \xi - X 2 I < <5} (2) 

Remark 2. Bounded function / is continuous over interval [a,b] if and only if ujf{5) = 0{6). 
Moreover, / is uniformly continuous if and only if ojf{5) = o(d). 

We are now ready to state our main ingredient, i.e. Bernstein’s approximation theorem, which 
shows bounded functions with enough continuity are well approximated by Bernstein polynomials. 
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Theorem 4 For any bounded function f over [0,1] we have 

where for any bounded functions fi and f 2 , ||/i — /2II00 — sup{|/i(x) — f 2 {x)\ : x G [0,1]}. 


( 3 ) 


Proof of Theorem\^ To prove our claim, we start by showing (k, (5)-uniform-induced-ordering prop¬ 
erty forces the arrival time of items to have almost the same higher-order moments as uniform 
independent random variables. More precisely, we have the following lemma (the proof is provided 
in Appendix IB.1.2|) . 

Lemma 1. Suppose vr is drawn from a permutation distribution satisfying the {k, 6)-uniform- 
induced-ordering property, and {xi,... ,Xp} is an arbitrary set of p items. Let </> : [n] —)• {i/n : 
i G [n]} be a uniform random mapping, and random variables Xi = 7r{xi)ln for all i £ \p]. Then 

for every h < ^ we have E 0?=! ^ (1 “ 0^=1 

Given LemmalU roughly speaking the key idea for the rest of the proof is looking at probabilities 
as the expectation of the indicator functions, and then trying to approximate the indicator functions 
by polynomials. Now, to compute probabilities all we need are moments, which due to Lemma[T]are 
almost equal to those of uniform independent random variables. Rigorously, we prove the following 
probabilistic lemma using this idea. (The proof is provided in Appendix IB. 1.21) . 

{i/n : i G [n]} he a uniform random mapping. Furthermore, let 


n 


Lemma 2. Let q 

Ai, A 2 ,..., Xp be random variables over [0,1] such that for every ki < d we have E nf=i 


ki 


> 


E 


(1 — 6). Then for any disjoint intervals {{ai,bi)}^^i of [0,1] where ai and bi are 


multiples of 1/n and \bi — ai\ > d i, we have: 

■ p 

/\{Xi G [ai,bi]) 


Pr 


li=l 


> 


n {bi - ai) I (1 - 5) - 


^ 2=1 


7p 

d^ 


( 4 ) 


Now, by combining Lemma[T]and Lemma[2l we check the {p, q, 5)-block-independence property. 
Start by setting d = ^ • By Lemma [2] the probability approximation error from what desired will 


be O 


-^\=o 


p5/4 

PA 


|. This error goes to zero as fc —>■ 00 if we set p = o{k5). Moreover, we need 
\bi — Oil > d~i. So, I = ^ (dp)' d = u{ks), if we set q = 0{k5) we are fine. This completes 


the proof. 


□ 


2.3 Constructions of Probability Distributions Implying the Properties 
2.3.1 Randomized One-Dimensional Projections 

In this section we present one natural construction leading to a distribution that satisfies the {k, 6)- 
UIOP. The starting point for the construction is an n-tuple of vectors xi,...,Xn G If one 
sorts these vectors according to a random one-dimensional projection (i.e., ranks the vectors in 
increasing order of w ■ Xi, for a random w drawn from a spherically symmetric distribution), when 
does the resulting random ordering satisfy the {k, d)-UIOP? Note that if any k of these vectors 
comprise an orthonormal /c-tuple and one ranks them in increasing order of w ■ Xi, where w is 
drawn from a spherically symmetric distribution, then a trivial symmetry argument shows that the 
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induced ordering of the k vectors is uniformly random. Intuitively, then, if the vectors xi,..., are 
sufficiently “incoherent”, then any A:-tuple of them should be nearly orthonormal and their induced 
ordering when projected onto the 1 -dimensional subspace spanned by w should be approximately 
uniformly random. The present section is devoted to making this intuition quantitative. We begin 
by recalling the definition of the restricted isometry property [^. 

Definition 5. A matrix X satisfies the restricted isometry property (RIP) of order k with restricted 
isometry constant 5^ if the inequalities 

(l-4)||xf < \\XTxf < (l+4)||xf 

hold for every submatrix Xt composed of |T| < k columns of X and every vector x G Here 

II • II denotes the Euclidean norm. 

Several random matrix distributions are known to give rise to matrices satisfying the RIP with 
high probability. The simplest such distribution is a random d-hy-n matrix with i.i.d. entries drawn 
from the normal distribution AA(0, . It is known [^, that, with high probability, such a matrix 

satisfies the RIP of order k with restricted isometry constant 5 provided that d = ^ . Even 

if the columns xi,... ,Xn of X are not random, if they are sufficiently “incoherent” unit vectors, 
meaning that Xi ■ xj = 1 if i = j and Xi ■ Xj < S^/k otherwise, then X satishes the RIP. Using this 
idea, we prove the following theorem (with proof provided in Appendix IB. 1.3|) . 

Theorem 5. Let xi,...,x„ be the columns of a matrix that satisfies the RIP of order k with 
restricted isometry constant dk = If w is drawn at random from a spherically symmetric distri¬ 
bution and we use w to define a permutation of [n] by sorting its elements in order of increasing 
w ■ Xi, the resulting distribution over Sn satisfies the {k,5)-UIOP. 

2.3.2 Constructions with Low Entropy 

This subsection presents two constructions showing that there exist permutation distributions with 
entropy ©(log log n) satisfying the [k, 5)-UIOP for arbitrarily large constant k and arbitrarily small 
constant <5. The proof of the first result is an easy application of the probabilistic method (which is 
in Appendix IB. 1.4jl . The proof of the second result uses Reed-Solomon codes to supply an explicit 
construction. 

Theorem 6. Fix some ^ Inn. If S is a random ^-element multiset of permutations 

it: [n] —)• [n], then the uniform distribution over S fulfills the {k,6)-UIOP with probability at least 

1-i. 

n 

Theorem 7. There is a distribution over permutations that has entropy O(loglogn) and fulfills 
the {k, 6)-uniform-induced-ordering property where 5 = Q( iogio^giogn )' 

To derive Theorem [TJ we start by proving the following lemma. 

Lemma 3. For large enough n G N and some £ = n(log^ loglogn), there is a distribution over 
functions f'.U —>■ [I] with entropy O(loglogn) such that for any x,x' & U, x x', we have 
Pr [f{x) = fix')] = O( iogiogiogn )- 

Proof. We will define a function /, parameterized by oi, 02 , and 03 , as a composition of 8 functions, 
which are mostly injective. 

For i = 1,2, 3, let Ki = log^*^ n and Qi be a prime power such that 2Kf + 1 > Qi > Kf + 1 (Note 
that for large enough n, we can always find a prime power between Kf -F 1 and 2Kf F 1). Let Oj 
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be drawn independently nniformly from [qi — 1]. This is the only randomization involved in the 
constrnction. It has entropy log(gi — 1) + log(g '2 ~ 1) + log(q '3 — 1). 

Let Ci be a Reed-Solomon code of message length Ki and alphabet size Qi. This yields block 
length Ni = Qi — 1 and distance di = Ni — Ki + 1 = qi — Ki. In other words, Q is a function 
Ci'. Di ^ Ri with Di = [qi]^^ and Ri = [qi]^* such that for any w, w' G Di with w ^ w', we Ci{w) 
and Ci{w') differ in at least di components. 

Furthermore Oi defines one position in each code-word Ri. Given Ui, let hi: Ri ^ [g*], be the 
projection of a code-word w of Ci to its aRh component, i.e., hi{w) = Wcn- 

Finally, we observe that |A-i-i| = > 2(2^*+i)^ + 1 > So there is an injective 

mapping gi: [(^j] —>■ mapping alphabet symbols of Ci to messages of Cj+i. 

Overall, this defines a function f = o C 3 o g 2 o h 2 o C 2 o gi o hi o Ci, mapping values of Di to 

te]. 

Let fi = gi o hi o Ci o fi-i. 

Now let w, w' G Di, w 7 ^ w'. Observe that all functions except for the hi are injective. Therefore 
the event f{w) = f{w') can only occur if hi{Ci{fi-i{w))) = hi{Ci{fi-i{w'))) for some i. As Ci is a 
Reed-Solomon code with distance di, Ci{fi-i{w)) and Ci{fi-i{w')) differ in at least di components. 
Therefore, the probability that hi{Ci{fi-i{w))) / hi{Ci{fi-i{w'))) is at least 

By union bound, the combined probability that this does not hold for one i is bounded by 


Pr 


■ 3 

l\K{C,{h.i{w))) 

-i=l 


hi{Ci{fi.i{w'))) 




<31 




□ 

Proof of Theorem^ By the above lemma, there are constants ci, C 2 , C 3 such that the following 
condition is fulfilled. For some i = cilog^logn, there is a distribution over functions f: U ^ [^] 
with entropy C 2 log log n such that for any x,x' ^U, x ^ x', we have Pr [f{x) = f{x')] < logio^|iog n • 
Draw a permutation tt' : \i] —>■ \i\ uniformly at random and define the permntation vr: —?• [n] 

by using vr' o / and extending it to a full permutation arbitrarily. 

Let xi,...,Xk be distinct items from U. Conditioned on f{xi) 7 ^ f{xj) for all i 7 ^ j, we 
have 7 r(xi) < 7 r(a: 2 ) < ... < vr(xfc) with probability Fnrthermore, applying a union bound in 
combination with the above lemma, the probability that there is some pair i ^ j, with f{xi) = f{xj) 
is at most log^iog n • Therefore, the overall probability that 7 r(xi) < 7 r(x 2 ) < ... < 7 r(xfc) is at 

least (1 — iogfog^ogn)M- 

The entropy of the distribution that determines vr is C 2 log log n -|- log(^!) = O(loglogn). □ 


3 Tight Bound on Entropy of Distribution 

One of the consequences of the previous section is the fact that there are s-admissible—in fact, 
even s-optimal—distribntions with entropy O(loglogn). In this section, we show that this bonnd 
is actually tight. We show that every probability distribution of entropy o(loglogn) is not s- 
admissible. The crnx of the proof lies in defining a notion of “semitone sequences”—sequences which 
satisfy a property similar to, but weaker than, monotonicity—and showing that an adversary can 
exploit the existence of long semitone seqnences to force every algorithm to have a low probability 
of success. 

Theorem 8. A permutation distribution tt of entropy H = o(loglogn) cannot he s-admissible. 







Here is the proof sketch. We use the fact for distributions of entropy H there is a subset of 
the support of size k that is selected with probability at least 1 — iog(^ 3 ) • It then suffices to show 
that if the distribution’s support size is at most k, then any algorithm’s probability of success 
against a worst-case adversary is at most The theorem then follows by setting k = ^/\ogn. 

To bound the algorithm’s probability of success, we introduce the notion of semitone sequences, 
defined recursively as follows: an empty sequence is semitone with respect to any permutation 
TT, and a sequence is semitone w.r.t. tt if 'k{xs) G {min^gj^j 7r(xi), maxjgj^] 7r(xi)} and 

(xi,... ,X 5 _i) is semitone w.r.t. tt. We will show that given k arbitrary permutations of [n], there 
is always a sequence of length that is semitone with respect to all k permutations. Later on, 
we show how an adversary can exploit this sequence to make any algorithm’s success probability 
small. To make the above arguments concrete, we start by this lemma. 

Lemma 4. Suppose H = {tti, ... ,7rk}, where each Tr^ is a permutation overU. Then there exists a 
sequence (xi,... ,Xs) that is semitone with respect to each iTi and s > x+y- 

Proof. For a fixed permutation and a fixed item y gU, we dehne a function h^: U\{y} —)■ {0,1} 
that indicates whether tt^ maps x is to a higher than y or not. Formally, 


/if(x) 


0 if 7ri(x) < 7ri{y) 
1 if 7ri(x) > 7rj(y) 


Still keeping one item y G lA fixed, we now get a /c-dimensional vector by concatenating the val¬ 
ues for different tt*. This way, we obtain a hash function ; lA\{y} —)• {0,1}^, where h^(x) = 
{hlix),... ,hlix)). 

Starting from = U, we now construct a sequence of nested subsets 2 ... 

iteratively. At iteration t + 1, given set ^ 0, we do the following. For an arbitrary element 
Xs-t of [/(*), we hash each element of t/^*^\{xs_t} to a value in {0,1}^ by using Now 

^ U^^\{xs-t} is defined to be the set of occupants of the most occupied hash bucket in 

{0,1}^ 

Note that if we place Xg-t at the end of any semitone sequence in ?7h+^) it will remain semitone 
with respect to each tt*. This in turn implies that for any t' the sequence (xi,... ,xu) is semitone 
with respect to all vr*. 

It now remains to bound the length of the sequence (xi,...,Xs) we are able to generate. 
We achieve length s if and only if is the first empty set. At iteration t of the above 

construction, we have — 1 elements to hash and we have 2^ hash buckets, so > 

(|C/W| -1)2-^ > |C/W|2-(^+T and therefore |C/W| > 2 -*(^+T|[/(o)| = 2-*('=+Tn. As < 1, this 

implies <1. So s > □ 

We now turn to showing that an adversary can exploit a semitone sequence and force any 
algorithm to only have ^ probability of success. To show this we look at the performance of the 
best deterministic algorithm against a particular distribution over assignment of values to items. 

Lemma 5. Let V = {1,2,..., s}. Assign values from V to items (xi,..., x^) at random by 


value{xs) 


max(V) with probability 1/s 
min(V) with probability 1 — 1/s 


and then assigning values from V\{value{xs)} to items (xi,... ,Xs_i) recursively. Assign a value 0 
to all other items. 

Consider an arbitrary algorithm following permutation ir such that (xi,..., x^) is semitone with 
respect to tt. This algorithm selects the best item with probability at most y. 
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Proof. Fixing some (deterministic) algorithm and permutation tt, let At be the event that the 
algorithm selects any item among xi,... ,xt and let Bt be the event that the algorithm selects the 
best item among xi ,... ,xt. We will show by induction that Pr [Bt] = |Pr [^j]. This will imply 
Pr[i3,] = iPr [A]<i. 

For t = 1 this statement trivially holds. Therefore, let us consider some t > 1. By induction 
hypothesis, we have Pr = ^^Pr[^i_i]. As {xi,... ,xt) is semitone with respect to tt, xt 

either comes before or after all xi,..., xt-i. We distinguish these two cases. 

Case 1: xt comes before all xi,... ,xt-i. The algorithm can decide to accept xt (without 
seeing the items xi,..., xt-i). In this case, we have At for sure. We only have Bt if xt gets 
a higher value than xi,... ,xt-i. By definition this happens with probability j. So, we have 
Pr [.Bt] = bPr[^t]. The algorithm can also decide to reject xt. Then At if and only if At-i. 
Furthermore, Bt if and only if Bt-i and xt does not get the highest value among xi,..., xj. These 
events are independent, so Pr [Bt] = (1 — 7 )Pr [Bt-i]. Applying the induction hypothesis, we get 
Pr [Bt] = (1 - i)Pr [Bt-i] = ^ j^Pr [A-i] = jPr [A]- 

Case 2: Xt comes after allxi,..., Xt-i. When the algorithm comes to Xt, it may or may not have 
selected an item so far. If it has already selected an item (A-i); then this element is the best among 
xi ,... ,Xt with probability Pr [Bt-i \ A-i] = induction hypothesis. Independent of these 

events, xt is worse than the best items among xi,... ,xt-i with probability 1 — y- Therefore, we 
get Pr [Bt I At-i] = = j. It remains the case that the algorithm selects item xt (A\A-i)- 

This item is the better than xi,...,xt_i with probability j. That is, Pr[Bt \ A \ A-i] = \- 
In combination, we have Pr [Bt] = Pr [A-i] Pr [Bt \ A-i] + Pr [A \ A-i] Pr [Bt \ A \ A-i] = 
Pr [A-i] I + Pr [A \ A-i] I = iPr [A] • □ 

Now, to show Theorem [8l we first give a bound in terms of the support size of the distribution. 
In fact. Lemmas [Hand [5] with Yao’s principle then imply that any algorithm’s probability of success 
against a worst-case adversary is at most (details of the proof are in Appendix IB. 2 [) . Later on, 
we will show how this transfers to a bound on the entropy. 

Lemma 6. If n: U —>• [n] is chosen from a distribution of support size at most k, then any 
algorithm’s probability of success against a worst-case adversary is at most 

To get a bound on the entropy, we show that for a low-entropy distribution there is a small 
subset of the support that is selected with high probability. More precisely, we have the following 
technical lemma whose proof can be found in Appendix IB. 21 

Lemma 7. Let a be drawn from a finite set V by a distribution of entropy H. Then for any fc > 4 
there is a set T CV, \T\ < k, such that Pr [a G T] > 1 — iog(k- 3 ) ' 

Finally, Theorem [8] is proven as a combination of Lemma [6] and Lemma [71 

Proof of Theorem 0 Set k = yTogTi. Lemma [7] shows that there is a set of permutations 11 of size 
at least k that is chosen with probability at least 1 — ■ The distribution conditioned on vr 

being in 11 has support size only k. Lemma [6] shows that if vr is chosen by a distribution of support 
size k, then the probability of success of any algorithm against a worst-case adversary is at most 
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Therefore, we get 


Pr [success] = Pr [tt G n] Pr [success I vr G n] +Pr [tt ^ n] Pr [success | vr 0 11] 
< Pr [success | tt G 11] + Pr [tt 0 11] 

8H 


k + 1 

— 1 -^ 

logn 

= 0 ( 1 ) . 


log(fe - 3) 


□ 


4 Easy Distributions Are Hard to Characterize 

Which distributions are s-admissible, meaning that they allow an algorithm to achieve constant 
probability of correct selection in the secretary problem? The results in ^ and ^ inspire hope 
that the (fc, <5)-UI0P, the (p, q, <5)-BIP, or something very similar, is both necessary and sufficient 
for s-admissibility. Unfortunately, in this section we show that in some sense, it is hopeless to 
try formulating a comprehensible condition that is both necessary and sufficient. We construct a 
family of distributions vr with associated algorithms ALG having constant success probability when 
the items are randomly ordered according to vr, but the complicated and unnatural structure of 
the distribution and algorithm underscore the pointlessness of precisely characterizing s-admissible 
distributions. In more objective terms, we construct a which is s-admissible, yet for any algorithm 
whose stopping rule is computable by circuits of size less than the probability of correct 

selection is o(l). 

Throughout this section (and its corresponding appendix) we will summarize the adversary’s 
assignment of values to items by a permutation u; the largest value is assigned to item it(j). If a_ 
is any probability distribution over such permutations, we will let U— (alg, a) denote the probability 
that ALG makes a correct selection when the adversary samples the value-to-item assignment from 
a, and nature independently samples the item-to-time-slot assignment from vr. We will also let 

U-(*,u) = maxU-(ALG,CT) 

ALG 

U-(alg, *) = min I/-(alg, u) 

V- = minmax U-(alg, fj). 

a ALG 

Thus, for example, the property that vr is s-admissible is expressed by the formula V— = Q(l). 

As a preview of the techniques underlying our construction, it is instructive to first consider 
a game against nature in which there is no adversary, and the algorithm is simply trying to pick 
out the maximum element when items numbered in order of decreasing value arrive in the random 
order specified by vr. This amounts to determining V—{*, l), where l is the distribution that assigns 
probability 1 to the identity permutation. Our construction is based on the following intuition. In 
the secretary problem with uniformly random arrival order, the arrival order of items that arrived 
before time t is uncorrelated with the order in which items arrive after time t, and so the ordering 
of past elements is irrelevant to the question of whether to stop at time t. However, there is a great 
deal of entropy in the ordering of elements that arrived before time t; it encodes &{t\ogt) bits of 
information. We will construct a distribution vr in which this information contained in the ordering 
of the elements that arrived before time t = n/2 fully encodes the time when the maximum element 
will arrive after time t, but in an “encrypted” way that cannot be decoded by polynomial-sized 
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circuits. We will make use of the well-known fact that a random function is hard on average for 
circuits of sub exponential size. 

Lemma 8. If g : {0,1}” — [k] is a random function, then with high probability there is no circuit 
of size s{n) = 2^/{8kn) that outputs the function value correctly on more than | fraction of inputs. 

The simple proof of Lemma [5] is included in the appendix, for reference. 

Theorem 9. There exists a family of distributions tt_ € A(5n) such that = 1, but for 

any algorithm ALG whose stopping rule can be computed by circuits of size s{n) = 2*^/®, we have 
y^(ALG,i) = 0(l/n). 

Proof. Assume for convenience that n is divisible by 4. Fix a function g : {0,1}”/^ [^/2] such 

that no circuit of size s{n) = 24/(n^) outputs the value of g correctly on more than ^ fraction of 
inputs. The existence of such functions is ensured by Lemma [HI We use g to define a permutation 
distribution vr as follows. For any binary string x G {0, define a permutation 7r(x) by 

performing the following sequence of operations. First, rearrange the items in order of increasing 
value by mapping item i to position n — i + 1 for each i. Next, for i = 1,..., j, swap the items in 
positions i and i + and only if Xi = 1. Finally, swap the items in positions n and ^ -|- g{x). 
(Note that this places the maximum-value item in position ^ + g{x).) The permutation distribution 
vr is the uniform distribution over {vr(x) | x G {0,1}"'/^}. 

It is easy to design an algorithm which always selects the item of maximum value when the 
input sequence vr is sampled from vr. The algorithm first decodes the unique binary string x such 
that vr = 7r(x), by comparing the items arriving at times i and i + ^ for each i and setting the bit Xi 
according to the outcome of this comparison. Having decoded x, we then compute g{x) and select 
the item that arrives at time ^ -|- g{x). By construction, when tt is drawn from vr this is always the 
element of maximum value. 

Finally, if ALG is any secretary algorithm we can attempt use ALG to guess the value of g{x) for 
any input x G {0, l}”/^ by the following simulation procedure. First, define a permutation 7r'(x) 
by performing the same sequence of operations as in 7r(x) except for the final step of swapping the 
items in positions n and n/2 + g{x)-, note that this means that 7r'(x), unlike 7r(x), can be constructed 
from input x by a circuit of polynomial size. Now simulate ALG on the input sequence 7r'(x), observe 
the time t when it selects an item, and output t — §. The circuit complexity of this simulation 
procedure is at most poly(re) times the circuit complexity of the stopping rule implemented by ALG, 
and the fraction of inputs x on which it guesses g{x) correctly is precisely H— (alg, r). (To verify 
this last statement, note that ALG makes its selection at time t = '^ + g{x) when observing input 
sequence 7r(x) if and only if if also makes its selection at time t when observing input sequence 
7r'(x), because the two input sequences are indistinguishable to comparison-based algorithms at 
that time.) Hence, if H— (alg, if) > ^ then the stopping rule of ALG cannot be implemented by 
circuits of size 2”/®. □ 

Our main theorem in this section derives essentially the same result for the standard game- 
against-adversary interpretation of the secretary problem, rather than the game-against-nature 
interpretation adopted in Theorem (9) 

Theorem 10. For any function K{n) such that lim„^oo i^in) = 0 while lim^^oo = oo, there 

exists a family of distributions tt G A(S'n) such that V— = 0(1), but any algorithm ALG whose 
stopping rule can be computed by circuits of size s(n) = satisfies H-(alg, *) = 0(K(n)). 

The full proof is provided in Appendix IB., 11 Here we sketch the main ideas. 
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Proof sketch. As in Theorem [9l the algorithm and “nature” (i.e., the process sampling the input 
order) will work in concert with each other to bring about a correct selection, using a form of 
coordination that is information-theoretically easy but computationally hard. The difficulty lies 
in the fact that the adversary is simultaneously working to thwart their efforts. If nature, for 
example, wishes to use the first half of the input sequence to “encrypt” the position where item 1 
will be located in the second half of the sequence, then the adversary is free to assign the maximum 
value to item 2 and a random value to item 1, rendering the encrypted information useless to the 
algorithm. 

Thus, our construction of the permutation distribution ]£_ and algorithm ALG will be guided by 
two goals. First, we must “tie the adversary’s hands” by ensuring that ALG has constant prob¬ 
ability of correct selection unless the adversary’s permutation, a, is in some sense “close” to the 
identity permutation. Second, we must ensure that ALG has constant probability of correct selec¬ 
tion whenever a is close to the identity, not only when it is equal to the identity as in Theorem O 
To accomplish the second goal we modify the construction in Theorem [9] so that the first half of 
the input sequence encodes the binary string x using an error-correcting code. To accomplish the 
first goal we define to be a convex combination of two distributions: the “encrypting” distribu¬ 
tion described earlier, and an “adversary-coercing” distribution designed to make it easy for the 
algorithm to select the maximum-value element unless the adversary’s permutation a is close to 
the identity in an appropriate sense. □ 


5 Extensions Beyond Classic Secretary Problem 


We look at two generalizations of the classic secretary problem in this section, namely the multiple- 
choice secretary problem, studied in [^, and the online weighted bipartite matching problem, studied 
extensively in 2g, 23], under our non-uniform permutation distributions. We give a positive result 
showing that a natural variant of the algorithm in achieves a (1 — o(l))-competitive ratio under 
our pseudo-random properties defined in ^ while for the latter we show the algorithm proposed 
by [2^ fails to achieve any constant competitive ratio under our pseudo-random properties. 


Multiple-choice secretary problem We consider multiple-choice secretary problem (a.k.a. k- 
uniform matroid secretary problem). In this setting not only a single secretary has to be selected but 
up to k. An algorithm observes items with non-negative values based on the ordering tt: U ^ [n] 
and chooses at most k items in an online fashion. The goal is to maximize the sum of values of 
selected items. We consider distributions over permutations vr that fulfill the (p, q, (5)-BIP, for some 
p > k. We show that a slight adaptation of the algorithm in achieves competitive ratio 1 — o(l), 
for large enough values of k and q and small enough 5. 

The algorithm is defined recursively. We denote by fKLQ{n',k',cf) the call of the algorithm 
that operates on the prefix of length n' of the input. It is allowed to choose k' items and expects 
q' number of blocks. For k' = 1, fKLG{n',k',(f) is simply the standard secretary algorithm that 
we analyzed in Section [2.11 For k' > 1, the algorithm first draws a random number r(g') from a 
binomial distribution Binom(g', ^) and then executes ALG(^^^^n', \_k'/2 \, T{q')). After round ^-^n' 
(we assume n' is always a multiplier of q'), the algorithm accepts every item whose value is greater 
than the [A:y2j-highest item arrived during rounds 1,..., , until k' items are selected by the 

algorithm or until round n'. Output is the union of all items returned by the recursive call and all 
items algorithm picked after the threshold round. We now have the following theorem, which is 
proved in Appendix [5l 


13 







Theorem 11. Suppose vr is drawn from a permutation distribution that satisfies {p,q,6)-BIP for 
some p> k and 6 < ^. Then for all permutations a, ALG(i^, k, q) is (1 — O(^) — e)-competitive 
for the k-uniform matroid secretary problem, where e can be arbitrary small for large enough value 
of q and small enough value of 5. 


Online weighted bipartite matching Next, we consider online weighted bipartite matching, 
where the vertices on the offline side of a bipartite graph are given in advance and the vertices 
on the offline side arrive offline in a random order (not necessariiy uniform). Whenever a vertex 
arrives, its adjacent edges with the corresponding weights are reveaied and the oniine aigorithm has 
to decide which of these edges shouid be inciuded in the matching. The objective is to maximize 
the weight of the matching selected by online algorithm. A celebrated result of Korula and Pal 
[ 2 ^ shows the existence of a constant competitive online algorithm under uniform random order 
of arrival; nevertheless, this algorithm does not achieve any constant competitive ratio under our 
non-uniform assumptions for permutation distributions. 

Theorem 12. For every k and 5, there is an instance and a probability distribution that fulfills the 
{k, 6)-uniform-induced-ordering property such that the competitive ratio of the Korula-Pdl algorithm 
is at least Q 


6 Conclusion 


In this paper we have studied how secretary algorithms perform when the arrival order satisfies 
relaxations of the uniform-random-order hypothesis. We presented a pair of closely-related proper¬ 
ties (the {k, (^)-UIOP and the {p, q, <5)-BIP) that ensure that the standard secretary algorithm has 
constant probability of correct selection, and we derived some results on the minimum amount of 
entropy and the minimum circuit complexity necessary to achieve constant probability of correct 
selection in secretary problems with non-uniform arrival order. 

We believe this work represents a first step toward obtaining a deeper understanding of the 
amount and type of randomness required to obtain strong performance guarantees for online al¬ 
gorithms. The next step is to expand this study beyond the setting of secretary problems. A 
very promising domain for future investigation is online packing LP and its generalization, online 
convex programming. Our positive result on the uniform matroid secrerary problem constitutes a 
first step toward obtaining a general positive result confirming that existing algorithms such as the 
algorithms of 241 and [l| preserve their performance guarantees when the input ordering satisfies 
(fc, (5)-UIOP or some other relaxation of the uniform randomness assumption. 
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A A secretary algorithm for ( 3 , 3)-induced-ordering property 


Theorem 13. If a probability distribution fulfills the {3,6)-induced-ordering property, there is an 
algorithm for the secretary problem that selects the best item with probability . 

Proof. Consider the following algorithm: First we draw a threshold r uniformly at random. Then 
we observe all items until round r. After round r, we accept the first item that is better than all 
items seen so far. 

To analyze this algorithms let xi,X 2 ,... ,Xn be the items in order of decreasing value. To select 
xi it suffices that X 2 comes until round r and xi comes after round r. For i > 3, let 17 be a 0/1 
random variable indicating if 7r{x2) < 7r(xi) < 7r(xi). 

Conditioned on ® 7r(x2) < vr(xi), the probability that X 2 comes until round r 

and xi comes after round r is exactly because there are a items coming between X 2 and xi, 
giving a + 1 positive outcomes for r. 

We have E [17] > (1—(5)^ = As 17 = 1 implies 7r{x2) < vr(xi), we get E [17 | 7r(x2) < 7r(xi)] > 

_ 1—<5 _ _ _ 1—^ ^ 1—(5 _ 1—^ 

6Pr[7r(a;2)<7r(xi)] 6(1—Pr[7r(x2)>7r(a;i)]) “ 6(1—1-^7 3(1+5) 

Overall, we get 


n—3 

Pr [select xi | 7r(x2) < 7r(xi)] > ^Pr 

a=0 L«=3 

^ / n-3 




= a 


7r(x2) < 7r(xi) 


a -\- t 


n 


— 1 -I aPr 

r7 ^^ 


n 

1 

n 


a=0 


^Yi = a 
li=3 


1-lE 


> - 1 + E 

n \ 


E^. 

.1=3 

n 

.1=3 


7r(x2) < 7r(xi) 
7r(x2) < vr(xi) 

7r(x2) < vr(xi) 


^1 n—3 1—1 

“ n n 3(1 -b 6) 


> 


1-6 
3(1 + 1) 


Multiplying with Pr [7r(x2) < vr(iCi)] > we get 


Pr [select xi] > 


(1 - 
6(1 + 1 ) 


□ 
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B Deferred proofs 

B.l Proofs deferred from ^ 

In this section we restate some of the results from ^ and provide complete proofs. 


B.1.1 Full Proof of Theorem [T] 

The (p, g, (l)-block-independence property only makes statements about p-tuples. We will need the 
bound also for smaller tuples. Indeed, using a simple counting argument we can show that this is 
already implicit in the dehnition. 

Lemma 9. If a distribution over permutations is {p, q, 6)-block-independent, then it is also {p', q, S)- 
block-independent for any p' < p. 

Proof. Given xi,...,Xpi G U and bi...,bp' G [( 7 ], hll up the first tuple with arbitrary distinct 
entries Xp/+i, ... ,Xp G ZY. The event Aje[p'] can now be expressed as the union of all 

events Aje[p] = bi over all tuples ( 6 p'+i ... ,bp) G [q]^~^ ■ Note that these events are pairwise 

disjoint. Therefore, the probability of their union is the sum of their probabilities, i.e.. 


Pr 

/\ 7r^(ai) = bi 

= Pr 


.ie[p'] 





V A 


= ^ Pr 

(V+iv-.bp)6[g]P-p' 


f\ TT^{Xi) = bi 
ie[p] 


Using (p, gr, 5)-block-independence and |[( 7 ]^ ^'\ = we get 


Pr 


A 

ie[p'] 


bi 


a E (1 

(bp,.,bp)e[q]p-p' 


= qP-P'{l-6) 




p 


□ 

Proof of TheoremUi Let T = [|J denote the index of the block in which the threshold is located. 
Furthermore, let Xj G U he the jih best item. We condition on the event that xi comes in block 
with index i. To ensure that our algorithm picks this item, it suffices that X 2 comes in blocks 
1 ,...,r — 1. Alternatively, we also pick xi if the X 2 comes in blocks i -\- 1,... ,q and X 3 comes in 
blocks 1,... ,T — 1. Continuing this argument, we get 

q p 

Pr [correct selection] > ^^Pr \tt^{xi) = i,7r^{x2), ■ ■ ■ ,TT^{xj-i) > i,Tr^{xj) < T] . 

i=T+lj=2 
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We can now use Lemma [9] and apply (j, g, 5)-block-independence for j < p. This gives us 


q p 


Pr [correct selection] > E Ed-'') 


1 f q — i 

q 


i-2 


T- 1 

q 


i=T+lj=2 

We now reorder the sums and use the formula for finite geometric series. This gives us 


jp _ ^ 

Pr [correct selection] > W E ; E 

^ i=T+l ^ \j=2 


P / •\i-2 


= E - 

q ^ q 

^ i=T+i ^ 






= u-^)— E tL 


q ^^ ^ 

^ i=T+l 


q - * 




> (1 - 5 ) 


T- 1 


1 - 


q-T 


p-i' 


E - 

^ I 

i=T+l 


We now apply the following bounds 


T- 1 1 2 

->- 

q e q 


q e 




dx = In 


i=T+l 


X 


g + 1 
r +1 


> In 


9+1 

! + l 


= 1—In 


9 + e 
9 + 1 


In combination, they imply 

Pr [correct selection] > 


1 2 


e q 

>,1_^ 

e q 


(1-5) 1-1- 


1 


and 


> 1 - 




9 + e 

V+l 


- 1 = 1 - 


e — 1 
9 + 1 


1 - 


e — 1 
9 + 1 


( 1 - 5 )( 1 -( 1 -- 


p-i' 


> 1 _ e + 1 
“ e q 


-5-1- 


p-i 


□ 


B.1.2 Relation between the two properties 

Proof of Theorem 0 Note that it is safe to assume p < q as the statement is trivially fulfilled 
otherwise. Consider p distinct items xi,...,Xp G U. To have 7r(xi) < 7r(x2) < ... < vr(xp), it 
suffices that these elements are mapped to different blocks and with an increasing sequence of 
indices. There are (p such sequences. So, overall the probability is at least 



(1-5) 



(9 - pT 


p\ 


(1-5) 



> 1 - 


p 






□ 
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Proof of LemmaUl We first define random variables lij = and lij = for all 

E"- h ~ ~ 

i,j G [n]. Note that for all i G [p], Xi = Tr{xi)/n = —. This implies that 


E 


■ p 

TT 

■p. 

p / j 

TT \ \ 

E 

nLi (e;u4,)‘‘ 

.i=l 


1 

_1 




(5) 


By expanding the numerator due to linearity of expectation, we will have a sum of expectation of 
algebraic terms in the numerator, where each algebraic term multiplication of at most x p = k/2 
indicators lij. There are at most 2 x k/2 = k particular items involved in these indicator functions. 

Now, lets look at one of the terms, e.g. E nf=i 


,Xsff\ are involved. 


in which k items , 

The product forces the induced ordering of elements ,..., be in a particular 

subset S C Sk- 


Hence, E 


nf=i hi,ji = Pr [induced ordering by vr over {x^j,..., Xg^f} will be in 5]. Now as vr 


satisfies the (fe, 5)-uniform-induced-ordering property, we have 


E 


fc /2 

n 


l=l 


= Pr [induced ordering by vr over {x^^, ■ ■ ■, will be in S'] 

> (1 — (5)Pr [induced ordering by 4> over {xg^ ■,■■■■, will be in S] 
'k/2 


= (1 -<5)E 



1=1 


( 6 ) 


From (l 6 |) one can conclude that 


HT* 

> (1-J)E 


= (1 - 6)E 


= (1 - (5)E 


-i=l 


[i=i V ^ / J 


Li=i V ^ / J 


.i=l 


(7) 


which completes the proof. 


□ 


Proof of Lemm^ We define continuous functions fi: [0,1] —>• M for f G [p] by 


/ 

0 


for X < Ui or X > bi 


fi{x) 


7 

x-bi 


for Oi < X < Oi + 7 

for — 7 < X < 

for Oj + 7 < X < — 7 


where \bi — ai\ > 27 . Note that all of these functions are continuous and satisfy condition of 
Theorem |3| for oj (x) = ^. 

Observe that fi is point-wise smaller than the indicator function that has value 1 between 

a* and bi and 0 otherwise. Therefore, we have Pr {Xi G [ai, 6 i])] = E [nf=i (^ 0 ] > 

E[nr=i/.(^.)]- 

By Theorem H] for every i there is a polynomial function gi: [0,1] —>• M of degree d such that 
\\fi-gi\\oo < We now have Pi((/>(i)) > > 1^+-^^-^](</>(*))-^ 
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and therefore 


E 




_ 2=1 


> E 


n 27\/d) 


3p 


> Pr G [tti + 7 ,5i - 7 ]] - 

p 


flih -a^- 27) - > flih - ai) - (2p7 + > ^[(6, - ^ 


2=1 


2=1 


( 8 ) 


where to get inequality ( 1 ) we set 7 = 4. 


Furthermore, as gi is a polynomial function of 
degree at most d and E 1 I > E Inf=i 4'i'i')^* (1 “ for all ki < d, we get by linearity 
of expectation E [nf=i g{Xi)\ > (1 - (5)E [nf=i gM{i))]. Now we use gi{x) < fi{x) + 
giving us 


flhix,) 

_ 2=1 

> E 

n(9.(v.) - X) 
_2 = 1 

> E 

■ p 

7=1 

1 

,— 1 

Al 

CO 

1 

YlgiiH'i)) 

_ 2=1 


>(i-s) (nii-. - «.i - ^ > (1 - i) (nio. - »<]) - X (9) 

Overall, we get Pr {Xi G [a*, 6 *])] > (1 - 5) (nLi(^* “ “0) - as desired. □ 


B.1.3 Full proof of Theorem [5] 


Proof. For any fe-tuple of indices (ii,..., ij.) we must show that each of the kl possible orderings of 
(w-ii),... ,{w-ik) has probability at least By symmetry it suffices to show that the probability 
of the event {w ■ xi < w ■ X 2 < ■ ■ ■ < w ■ x^} is at least This event is unchanged by rescaling 
w, so we are free to substitute whatever spherically-symmetric distribution we wish. Henceforth 
assume w is sampled from the multivariate normal distribution AA(0,1), whose density function is 
(27r)“‘^/^ exp (i||t(;|p). 

Let Xk denote the matrix whose k columns are the vectors xi,... ,Xk and let A = X^ denote 
its transpose. Scaling xi,...,Xk by a common scalar, if necessary, we are free to assume that 
det{Xj,Xk) = 1. The RIP implies that the ratio of the largest and smallest right singular values of 
Xk is at most and since their product is 1 this means that the smallest singular value is at 

least 


l—5t. 


Now let C = { 2 ; G I zi < 22 < • • • < Zk}. The event {w ■ xi < w ■ X 2 < 
expressed more succinctly as {Aw G C}, and its probability is 


< w ■ Xk} can be 


Pr G C] = [ (27r) exp ( — i||u;||^) dtc. 

Jw&A-"^{C) 


The matrix A is not square, hence not invertible; the notation A~^{C) merely means the inverse- 
image of C under the linear transformation -A represented by A. The Moore-Penrose 
pseudoinverse of A is the matrix Xk{Xj,Xk)~^, which we denote henceforth by A'^. We can write 
any w G A~^{C) uniquely as z-\- A'^y, where y & C, z & ker(^), and z is orthogonal to A~^y. By our 
scaling assumption, the product of the singular values of A~^ equals 1 , which justifies the second 
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line in the following calculation 


Pr [Aw E C] 



f f (27r) '^/2exp(-i||2||2 

y&C Jz&.er{A) 


(/^^(27r) '^/^exp (-ip+yf) 
(j|^^^(27r)-^/2exp(-lp+yf)dy). 


ip+yf) dzdy 

{ [ exp ( 

\Jz&kev{A) 



We can rewrite the right side as an integral in spherical coordinates. Let dw denote the volume 
element of the unit sphere C and let 5 = C H 5^“^. Then writing y = ru, where r > 0 

and u is a unit vector, we have 


Pr 


P POO 

[Aw&C]= / / (27r)“*^'^^ exp ( — dr dw(tt) 

Ju^S J r=0 

P POO 

= / (27r)“^/^||74’''u||“^ / exp ( — ds da;(u). 

J uGS j s=0 


The singular values of A'^ are the multiplicative inverses of the singular values of X^, hence the 
largest singular value of A'^ is at most In other words, ||^'’'u|| < for any unit vector u. 

Plugging this bound into the integral above, we find that 

( \k C 1 / \k 

T^) / (27r)-^/2 / exp(-is2)s^-idsda;(u) = -(i^) , 

' JuGS js=0 h. \ / 


where the last equation is derived by observing that the integral is equal to the Gaussian measure 
of C. Finally, by our choice of 6k, we have > 1 — d, which concludes the proof. □ 


B.1.4 Full proof of Theorem [6] 


. We show this claim using the probabilistic method. Permutation vrj: ZY —>■ [n] is drawn uni¬ 
formly at random from the set of all permutations with replacement. We claim that the set 
5* = {tti, ..., TT^} fulfills the stated condition with probability at least 1 — 

Fix k distinct items xi,...,Xk E U. Let Tj = 1 if vrj(xi) < 7rj(x2) < ... < 7rj(xfc). As vrj is 
drawn uniformly from the set of all permutations, we have Pr [1^ = 1] = That is, we have 


E 




This gives us 


A 

k\- 


As the random variables YJ are independent, we can apply a Chernoff bound. 


Pr 



< exp 



= n 


k+l 


Note that if J2i=i ^ “ '^)m then the respective sequence xi,...,Xk E U has probability at 

least (1 — d)^ when drawing one permutation at random from S. 

There are fewer than possible sequences. Therefore, applying a union bound, with probability 
at least 1 — ^ the bound is fulfilled for all sequences simultaneously and therefore S fulfills the stated 
condition. □ 
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B.2 Proofs deferred from ^ 

In this section we provide complete proofs of some of the results in ^ 

Proof of Lemma\^ Let 11, |n| < k, be the support of the distribution vr is drawn from. Lemma 0] 
shows that there is a sequence (xi,..., Xs) of length s = that is semitone with respect to any 
permutation in tt. 

It only remains to apply Yao’s principle: Instead of considering the performance of a random vr 
against a deterministic adversary, we consider the performance of a fixed tt against a randomized 
adversary. Lemma ?? shows that there is a distribution over instances such that no vr E 11 has 
success probability better than ^ □ 

Proof of Lemma\^ Set a = and (3 = Note that for a > |, the statement becomes 

trivial. Therefore, we can assume without loss of generality that a < |. This implies log(a) < 0. 
Therefore, we get 

H a log(k — 3) 

- = - <C Q: . 

— log /3 log(fc — 3) — log(Q;) “ 

Let oi, ..., Ofc be the elements of V such that Pq. > j3 for all i and pa^ > Pa 2 > • • • > Pak- 
Furthermore, partition V \ {oi,..., a^} into such that psi E [/3, 2/1) for i < i, ps^ < 2(3 

Observe that pa^ < ^ because probabilities sum up to at most 1. Therefore, for z > 3, we have 
—PaTog(pai) > —/31og/3 by monotonicity. Furthermore, for all j < £, we have —log(p 5 ^.) > 
—(3log(3. In combination, this gives us 


k i-l 

^ -PSi log(psJ >{k + i- 3)(-/3 log (3) . 

i=3 j=l 


For k and i, this implies 


, H „ ot „ , H 

k < -r + 3< — + 3<k and £ < — - + 3 < — + 3 . 

-(3 log (3 (3 -(3 log (3 (3 


In conclusion, we have 


^PS. < 2/3£ < 2a + 6/3 < 8a . 
i=i 


□ 


B. 3 Proofs deferred from ^ 

In this section we restate some of the results from ^ and provide complete proofs. 

Lemma 10. If g : {0,1}” ^ [k] is a random function, then with high probability there is no circuit 
of size s{n) = 2 ^/{ 8 kn) that outputs the function value correctly on more than | fraction of inputs. 

Proof. The proof closely parallels the proof of the corresponding statement for worst-case hardness 
rather than hardness-on-average, which is presented, for example, in the textbook by Arora and 
Barak [^. The number of Boolean circuits of size s is bounded by For any one of these circuits, 

C, the expected number of inputs x such that C{x) = g{x) is ^ -2"^. Since the events {C{x) = g{x)} 
are mutually independent as x varies over {0,1}*^, the Chernoff bound (e.g., [l^) implies that the 
probability of more than | • 2"' of these events taking place is less than exp ( — ^ • 2”). The union 
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bound now implies that the probability there exists a circuit C of size s that correctly computes g 
on more than | fraction of inputs is bounded above by exp (3s ln(s) — ^ • 2"'). When s = 2"'/ {8kn) 
this yields the stated high-probability bound. □ 

In the sequel we will need a version of the lemma above in which the circuit, rather than being 
constructed from the usual Boolean gates, is constructed from t different types of gates, each having 
m binary inputs and one binary output. 

Lemma 11. Suppose we are given t types of gates, each computing a specific function from {0,1}”^ 
to {0,1}. If g : {0,1}”’ ^ [k] is a random function, then with high probability there is no circuit 
of size s{n) < 2'^/{8k ■ maxjmn, ln(t)}) that outputs the function value correctly on more than | 
fraction of inputs. 

Proof. The proof is the same except that the number of circuits, rather than being bounded by 
s^®, is now bounded by (ts'”)® = exp(s In t-|-ms Ins). The stated high-probability bound continues 
to hold if ms Ins < ^ • 2”' and slnt < ^ • 2”. The assumption s(n) < 2"^/{8k ■ maxjmn, ln(t)}) 
justifies these two inequalities and completes the proof. □ 

B.3.1 Proof of Theorem 1101 

Similar to the proof of Theorem [9l our plan for proving Theorem [TO] is to construct a distribution 
over arrival orderings, 7£, in which the first half of the input sequence attempts to encode the 
position where the maximum-value item occurs in the second half of the permutation. What 
makes the proof more difficult is that the adversary chooses the ordering of items by value (as 
represented by a permutation a G Sn), and this ordering could potentially be chosen to thwart the 
decoding process. In our construction we will make a distinction between decodable orderings— 
whose properties will guarantee that our decoding algorithm succeeds in finding the maximum-value 
item—and non-decodable orderings, which may lead the decoding algorithm to make an error. We 
will then design a separate algorithm that succeeds with constant probability when the adversary’s 
ordering is non-decodable. 

We will assume throughout the proof that n is divisible by 8, for convenience. Recall, also, 
that the theorem statement declares K(n) to be any function of n such that lim^^oo ^(n) = 0 
while lim^^oo = oo. For convenience we adopt the notation [a, b] to denote the subset of [n] 

consisting of integers in the range from a to b, inclusive; analogously, we may refer to subsets of [n] 
using open or half-open interval notation. 

Definition 6. A total ordering of [re] is decodable if it satisfies the following properties. 

1. The maximal element of the ordering is re. 

2. When the elements of the set (q, §] are written in decreasing order with respect to the total 
ordering, the of the first re • re(re) elements that belong to (^, is at least 

Our proof will involve the construction of three distributions over permutations, and three 
corresponding algorithms. 

• an “encrypting” distribution vTg that hides item re in the second half of the permutation while 
arranging the hrst half of the permutation to form a “clue” that reveals the location of item 
re, but does so in a way that cannot be decrypted by small circuits; 

• a hrst “adversary-coercing” distribution i that forces the adversary to make item re the 
most valuable item; 
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• a second “adversary-coercing” distribution 2 that forces the adversary to satisfy the second 
property in the definition of a decodable ordering. 

Corresponding to these three distributions we will define algorithms ALGg, ALGc,i, alGc ,2 such that: 

• ALGe has constant probability of correct selection when the adversary chooses a decodable 
ordering; 

• ALGc,i has constant probability of correct selection when the adversary chooses an ordering 
that violates the first property of decodable orderings; 

• alGc ,2 has constant probability of correct selection when the adversary chooses an ordering 
that satisfies the hrst property of decodable orderings but violates the second. 

Combining these three statements, one can easily conclude that when nature samples the arrival 
order using the permutation distribution ]£_ = g(vrg + + 7Ic, 2)) when the algorithm ALG is 

the one that randomizes among the three algorithms {ALGg, ALGg^i, alGc^ 2 } with equal probability, 
then ALG has constant probability of correct selection no matter how the adversary orders items 
by value. 

We begin with the construction of the distribution vr^ ^ and algorithm ALGc,i. The following 
describes the procedure of drawing a random sample from i. 


Algorithm 1 Sampling procedure for tTj. i 

1: Sample an (^)-element set L C [n — 1] uniformly at random. 

2: Let R = [n — 1] \ L. 

3: Let tt' by the permutation that lists the elements of L in increasing order, followed by the 
elements of R increasing order, followed by n. 

4: Choose a uniformly random i G [Ij f] let r, be the transposition that swaps elements n 
and i. {If i = n then is the identity permutation.) 

5: Let TT = Tj o tt'. 


Define ALGc,i to be an algorithm that observes the first ^ elements, sets a threshold equal to 
the maximum of the observed elements, and selects the next element whose value exceeds this 
threshold. In the following lemma and for the remainder of this section, p denotes the permutation 
that lists the items in order of increasing value, i.e., p{i) = n — i. 

Lemma 12. If the adversary’s ordering a assigns the maximum value to any item other than n, 
then (alGc,i, (t) > On the other hand, V-‘=’^{*,p) = 

Proof. Suppose that a assigns the maximum value to item i ^ n, and suppose that item j receives 
the second-largest value among items in [n — 1]. In the sampling procedure for vr^ , the event that 
j G L and i G R has probability 

n/2 (n/2) - 1 1 

n — 1 re — 2 ^4’ 

and when this event happens the algorithm ALGgq is guaranteed to select item i. 

To prove the second part of the lemma, suppose the adversary assigns values to items in in¬ 
creasing order and observe that this guarantees that the first re/2 items in the permutation -ir' 
(dehned in Step [3] of the sampling procedure) are listed in increasing order of value, and that the 
first re/2 items in vr are the same except that the value at index i is replaced by the maximum value. 
Now consider any algorithm and let t denote the time when it makes its selection when facing a 
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monotonically increasing sequence of n values. If f > n/2, then the algorithm assuredly makes an 
incorrect selection when facing the input sequence np. If t < n/2, then the algorithm makes a 
correct selection if and only if t matches the random index i chosen in the sampling procedure for 
TT, an event with probability 2/n. □ 

We next present the construction of 2 - 


Algorithm 2 Sampling procedure for 2 

1: With probability reverse the order of the first j items in the list. 

2: Initialize / = 0. 

3: for z = 1, ..., ^ do 
4: With probability —V-r: 

5: Swap the items in positions i and i + 

6: Add i to the set I. 

7: If i > I then add i into the set 

8 : end for 

9: Let tt ' denote the permutation of items defined at this point in the procedure. 
10; if is non-empty then 
11: Choose a uniformly random index i ^ 

12: else 

13: Choose a uniformly random index i G (f i f ] • 

14: end if 

15: Let Ti be the transposition that swaps elements n and i. 

16: Let vr = Tj o tt'. 


Define alGc ,2 to be an algorithm that observes the first ^ elements, sets a threshold equal to 
the maximum of the observed elements, and selects the next element whose value exceeds this 
threshold. 

Lemma 13. If the adversary’s ordering a assigns the maximum value to item n but violates Prop¬ 
erty m in the definition of a decodable ordering, then On the other hand, 

= 0{K{n)). 

Proof. First suppose that a assigns the maximum value to item n but violates Property [2] in the 
definition of a decodable ordering. To prove that P-=’2 (alGc, 2, o') > note first that alGc ,2 is 
guaranteed to make a correct selection if the permutation vr' (defined in step [9] of the sampling 
procedure) has the property that the highest-value item found among the first n/4 positions in tt' 
belongs to one of the first ^ positions. Recalling the set I defined in the sampling procedure, let 
J = [ 1 ,1^] \ I and let A = {i ^ I i G /}. Note that J U A is the set of items found among the 
first n/4 positions in vr'. Let j,k denote the highest-value elements of J and A, respectively. (If 
A is empty then k is undefined.) Step [Hof the sampling procedure ensures that with probability 
item j belongs to one of the first ^ positions, and this event is independent of the event that A 
is non-empty and k belongs to one of the first positions. To complete the proof, we now bound 
the probability of that event from below by 

Let ii, ^ 2 ,..., in/i denote a listing of the elements of the set (j, |] in decreasing order of value. 
For 1 < I the probability that k = if, is (l — ^ denote the set of £ < n K{n) 

such that if < By our hypothesis that a violates Property [2] in the definition of a decodable 
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ordering, we know that If ^ for any i £ L, then k belongs to one of the first | 

positions in vr'. The probability of this event is 


^ Pr [A; = ^ M - 

teL teL ^ 


1 

n K(n) 


i-i 


> 


nK{n) nK{n) 


E 

l&L 


1 y.(n)-l ^ 1^1 1 ^ 

n K,{n) J ~ nK{n) e 100’ 


as desired. 

The second half of the lemma asserts that = 0{K{n)), where p denotes the permu¬ 

tation that lists the items in order of increasing value. To prove this, first recall the set /’*' defined 
in the sampling procedure; note that |/■’■| is equal to the number of successes in ^ i.i.d. Bernoulli 
trials with success probability ^ . Hence E and, by the Chernoff Bound, 


Pr 


|I+|< 


16 «:(n) 


< exp — 


128 K(n) 


< 128K(n). 


Conditional on the event that |/^| > the conclusion of the proof is similar to the conclusion 

of the proof of Lemma [T 2 j Consider any algorithm and let t denote the time when it makes its 
selection when the items are presented in the order tt' . Also, let s denote the time when item n 
is presented in the order vr. The first s items in vr and n' have exactly the same relative ordering 
by value since, by construction, s £ and hence the element that arrives at time s in vr' has 
the maximum value observed so far. Hence, the algorithm makes a correct selection only when 
t = s. However, if t then this event does not happen, and if t G the event t = s happens 
only if s is the random index i selected in Step 1111 an event whose probability is 1/|/“*’|, which is 
at most 16K(n) since we are conditioning on |/■'■| > Combining our bounds for the cases 

I/’*"! < iqIitP and |/+| > we find that for any algorithm ALG, 


< Pr 


< W«V.)] 

+ Pr 

1 1 16K(n) 

-h Pr 


I/+I > 


1 


— 16 Ac(n) 

correct selection 


Pr correct selection 


ini > 


16K(ri.) 


< 128K{n) + 16K{n), 


as desired. □ 

Finally, we present the construction of the permutation distribution tt^. A crucial ingredient is 
a coding-theoretic construction that may be of independent interest. 

Definition 7. We say that a function A : {0, 1}^ —)• {0, 1}"' has half-unique-decoding radius r if at 
least half of the inputs x £ {0,1}^ satisfy the property that for all x' 7 ^ x, the Hamming distance 
from A{x) to A{x') is greater than 2 r. 

Codes with half-unique-decoding radius r are useful because they allow a receiver to decode 
messages with probability at least in a model with random messages and adversarial noise. The 
following easy lemma snbstantiates this interpretation. Here and subsequently, we use ||y — y^|| to 
denote the Hamming distance between strings y,y'■ 

Lemma 14. Suppose that: 

• a uniformly random string x £ {0,1}^ is encoded using a function A whose half-unique- 
decoding radius is r, 
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• an adversary is allowed to corrupt any r bits of the resulting codeword, and 

• a decoding algorithm receives the corrupted string y, finds the nearest codeword (breaking ties 
arbitrarily), and applies the function A~^ to produce an estimate x of the original message. 

The Pr [x = x] > ^ regardless of the adversary’s policy for corrupting the transmitted codeword. 

Proof. Let y = A{x). The constraint on the adversary implies that \\y — y|| < r. The definition of 
half-unique-decoding radius implies that, with probability at least ^ over the random choice of x, 
the nearest codeword to y is at Hamming distance greater than 2r. By the triangle inequality, this 
event implies that y is the unique nearest codeword to y, in which case the decoder succeeds. □ 

The particular coding construction that our proof requires is a code with the property that, 
roughly speaking, all of its low-dimensional projections have large half-unique-decoding radius. The 
following definition and lemma make this notion precise. 

Definition 8. For S C [n], let proj^ : {0,1}”^ —)• {0, Ill'll denote the function that projects a 
vector onto the coordinates indexed by S. In other words, letting (ii,f 2 ,... ,is) denote a sequence 
containing each element of S once, we define proj 5 (y) = {yi.^,yi^,... ,yi^). For any function A : 
{ 0 , 1 }^ —)■ { 0 , l}'^, we introduce the notation As to denote the composition proj^-H : X —)■ { 0 , Ill'll. 

Lemma 15. For all sufficiently large m, if 2m < n < y • 2^"* , there exists a function A : 

{0,1}™ —)• {0,1}"' such that for every set S C [n] of cardinality 2m, the function As has half- 
unique-decoding radius 

Proof. We prove existence of A using the probabilistic method, by showing that a uniformly random 
function A : {0,1}™ —)■ {0, !}"■ has the property with positive probability. To do so, we need to 
estimate the probability, for a given set S, that As fails to have half-unique-decoding radius 

Dehne a graph Gs with vertex set {0,1}*” by drawing an edge between every two vertices 
x,x' such that ||Hs’(x) — Hs’(x')|| < The event that As has half-unique-decoding radius ^ 
corresponds precisely to the event that Gs has at least 2™“^ isolated vertices. When this event 
does not happen, the number of connected components in Gs is at most 2 "^ — 2 "^“^, so a spanning 
forest of Gs has at least 2 ™“^ edges. 

Our plan is to bound—for every set S C [n] of size 2m and every forest F with 2™“^ edges—the 
probability that Gs contains all the edges of F. Summing over S and F we will find the sum is less 
than 1 , which implies, by the union bound, that with positive probability over the random choice of 
A no such pair (5, F) exists. By the arguments in the preceding paragraph, it follows that when no 
such pair (5, F) exists the half-unique-decoding radius of As is ^ for every S of size 2m, yielding 
the lemma. 

To begin, let us fix x,x' G {0,1}™' and S C [n] with |5| = 2m, and let us estimate the 
probability that ||yl 5 (x) — yl 5 (x')|| < The strings and H 5 (x') are independent uniformly- 

random binary strings of length 2m. The number of binary strings within Hamming distance ^ 
of As{x) is bounded above by where Hiji) denotes the binary entropy function 

— plog 2 (p) — {1 — p) log 2 (l — p). Using the fact that 2H{^) < 0.95 we can conclude that for large 
enough m, fewer than binary strings belong to the Hamming ball of radius ^ around As{x). 

Hence the probability that As{x') is one of these strings is less than 2 “™/^*^. If F is the edge set of 
a forest on vertex set { 0 ,1}™, then the random variables As{x) — As{x') are mutually independent 
as (x, x') ranges over the edges of F. Consequently the probability that all the edges of F are 
contained in Gs is less than 
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Let N = 2™. The number of spanning trees of an A/^-element vertex set is ^ [sl; 1I| and 
the number of forests with N/4 edges contained in any one such tree is (7(^4) • Thus, the number 
of pairs (5, F) where S C [n] has 2m elements and F is the edge set of a forest with vertex set 
{0,1}™ is bounded above by (2m) (7^4) the union bound, we conclude that the 

probability of failure for our construction is bounded above by 


(2I) 


N/4 


where we have used the inequalities (^) < (valid for all 0 < /c < n) and (^4) < 2^ (valid 

for all N). The base-2 logarithm of the probability of failure is bounded above by 

2m[l + log(n) - log(m)] -b iV [l log(iV) - . 

(All logs are base 2.) Substituting N = 2™“^ and rearranging terms, we find that this expression 
is negative (i.e., the probability of failure is strictly less than 1) when 


log(n) < log(m) — 1 + 


-)m—3 ■ 


m 


79m 

"so" 


- 1 


< log(m) -1-1-2 


m—4 


provided m > 2. This inequality is satisfied when n < y • 2^"* which completes the proof. □ 

We now continue with the construction of the permutation distribution /Kj,. Let m = K{n)\. 

By Lemma [l5] there exists a function A : {0,1}™ (0,1}"’/® such that for all S C [n] with 

I S' I = 2m, the half-unique-decoding radius of As is Let us choose one such function A for 
the remainder of the construction. Define an A-augmented circuit to be a circuit constructed from 
the usual AND, OR, not gates along with n/8 additional types of gates that take an m-bit input 
X and output one of the bits of A{x). By Lemma fTTl there exists a function g : {0,1}™ [^] 

such that no A-augmented circuit of size s{n) < 2™/(4m^n) computes the value of g correctly 
on more than ^ fraction of inputs. Let us choose one such function and denote it by g for the 
remainder of the construction. (To justify the application of Lemma fTHl note that our assumption 
that nK(n)/log(n) 00 implies < 2^™ for all sufficiently large n.) Armed with the functions 
g and A we are ready to present the construction of 


Algorithm 3 Sampling procedure for 
1: Sample x € {0,1}™ uniformly at random. 

2: Let y = A{x) G {0,1}*^/®. 

3: for z = 1,..., do 

4: if z/j = 1 then 

5: Swap the items in positions ^ + i and j +i. 

6: else 

7: Leave the permutation unchanged. 

8 : end if 

9: end for 

10: Swap the items in positions n and ^ -|- g{x). 


The corresponding algorithm ALGe works as follows. 
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Algorithm 4 Algorithm ALGg 
1: Observe the first ^ elements of the input sequence. 

2: Let J denote the set of items with arrival times in the interval [j, , i.e. J = . 

3: Let ji,..., Jn/4 denote a listing of the elements of J in order of decreasing value. 

4: for £ = 2m do 

5: if vr(j£) < ^ then 

6: Set y(, = l and ii = n^je) - f. 

7: else 

8: Set Vi = 0 and = 7r(j» - 

9: end if 

10: end for 

11: Set S — (zi, Z2) • • • ) ^2m) • 

12: Find the x G {0,1}™ that minimizes ||A5(x) — y\\, breaking ties arbitrarily. 

13: Select the item that arrives at time t = ^ + g{x)- 


Lemma 16. If the adversary’s ordering a is a deeodable ordering, then V-‘={ALGe,o') > On 
the other hand, for any algorithm ALGp whose stopping rule can be computed by circuits of size 
s{n) = ^ y;e tiave < A^jn. 

Proof. Note that a permutation tt sampled from vTg always maps the set to itself, though it 

may permute the elements of that set. Consequently, when one runs ALGg on an input sequence 
ordered using vr in the support of vTg, it sets <^ = (f) f ] • The definition of a deeodable permutation 
now implies that the fraction of items in that belong to (^, §] is at least let 

us call the remaining items in {ji, ... ,j2m} “misplaced”. For each jn that is not misplaced, ALGe 
correctly deduces the corresponding value yi unless item ji — | also belongs to {ji,... ,j2m} (in 
which case it is a misplaced item). Hence each misplaced item contributes to potentially two errors, 
meaning that at most ^ fraction of the bits in y differ from the corresponding bit in A{x). These 
strings have length 2m, so we have shown their Hamming distance is at most Lemma [TT] now 
ensures that with probability at least ALGe decodes the appropriate value of x. When this 
happens, it correctly selects item n from the second half of the input sequence. Our assumption 
that a is deeodable means that item n is the item with maximum value, which completes the proof 
that H-e(ALGe, cr) > 

To prove the second statement in the lemma, we can use ALGp to guess the value of g{x) for 
any input x G {0,1}™ by the following simulation procedure. First, define a permutation tt' [x) by 
running Algorithm [3| with random string x, omitting the final step of swapping the items in positions 
n and n/2 + g{x)\ note that this means that 7r'(x), unlike 7r(x), can be constructed from input x 
by an A-augmented circuit of polynomial size. Now simulate ALG on the input sequence vr'(x), 
observe the time t when it selects an item, and output t — f. The A-augmented circuit complexity 
of this simulation procedure is at most poly(n) times the A-augmented circuit complexity of the 
stopping rule implemented by ALG, and the fraction of inputs x on which it guesses V-{x) correctly 
is precisely V—[alg,l). (To verify this last statement, note that ALG makes its selection at time 
t = ^ + g{x) when observing input sequence 7r(x) if and only i/if also makes its selection at time 
t when observing input sequence 7r'(x), because the two input sequences are indistinguishable to 
comparison-based algorithms at that time.) Hence, if H—(alg, <-) > ^ then the stopping rule of 
ALG cannot be implemented by circuits of size 2”*/^ = _ q 
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B.4 Proofs deferred from ^ 
B.4.1 Pull proof of Theorem II IB ?? 


We start by proving the following lemmas which turn out to be critical for the analysis of ALG(W, k, q) 
under non-uniform permutation distributions. In fact, these lemmas capture the fact that if mem¬ 
bership random variables of different items for the random set S (and S'^) are almost pairwise 
independent (rather than mutually independent), then we still preserve enough of probabilistic 
properties that are needed in the analysis of algorithm proposed by (25| . 

Lemma 17. Suppose vr is drawn from a permutation distribution satisfying {p,q,6)-BIP for p > 2 
and S = {x gU : p{x) < r(g)} where T{q) is independently drawn from Binom{q, 1/2). Then for 
any T CU sueh that 5 < we have 

1. E[|rn5|]G[(i-5)|T|/2,(i + 5)|r|/2] 

g. E [val{Tr\S)] G [(1 - 6)val{T)/2, (1 + 6)val{T)/2] 

3. Pr [|T n 5| > |T|/2 + a] < ^ 

Proof. For x gU, let be a 0/1 variable indicating if x G S'. We have 


E[Px]= [x is in block i] Pr [rj, > i] 


2=1 


> (1 -(5)-^gPr[rf, > i] 

^ i=i 

= (l-i)tE|T,] 

1-5 


1+5 


Analogously, we get E [W,] < 2 • 

Claims 1 and 2 now follow from linearity of expectation, e.g., E [|T n S|] = E > 

¥1^1- 

To show Claim 3, we use that for x 7 ^ x', we have E [PeEi;'] < This implies E [|r n Sp] = 






<E[|rnS|] + |r|(|T| -i)¥ <E[|rnS|]. 


By Markov’s inequality, we get 

Pr[|rnS| > |T|/2 + a] < Pr [(|rnS| - \T\/2f > 


<4e 


(|rns| - |r|/2)^ 


Using linearity of expectation and the bounds obtained so far, we get 


E 


(|rns| - |r|/2)^ 


= E 


(|rns|)^ - |r|E[|TnS|] + 


m 

2 


< 


< 


1 + 5, 2 1 + ^ 


l + <5 


■\TV- 

|r|2_ 


|r|-(|r|-i)E[|rnS|] 


1 + 5 


1-5 


^ |r|-(|r|-i) 2 


|T| 


< < 51 ++ 1 + 1+1 < + 
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1 


where the last inequality is true because of (^ < 


Vm' 


□ 


Lemma 18. Suppose vr is drawn from a permutation distribution satisfying {p,q,6)-BIP for some 
p > 2 and S is as defined in Lemma^T^ Let Yi be the (possibly negative) random variable such that 
{k/2Y^ item in the sorted-by-value list of items in S is the {k + Yi)*^ in the sorted-hy-value list of 
items inU. T/ien E [|Yi|] = 0(-\/fc). 


Proof. We have E[|li|] = > *] = ZlSi P''[^i ^ — “*]• 

bound each of the terms separately. For a fixed i, look at the event Yi < —i. This event is 
equivalent to the event that the number of items in S among k — i highest-valued items is at least 
k/2. Let us define r = k — i. Furthermore, let be the set of the r-highest valued items. Using 
Lemma [T7] we have: 


Pr [Ui < -i] = Pr [\Tr n F| > r/2 + i/2] < 


( 10 ) 


So we have 


oo \Vk] k ,, .. CXD 

^Pr[yi <-i] < 1+ ^ ' <l + Vk + 2k YI 72 


2=1 


2 = 1 


i=\pk'\ + l 

roo 

'Vk 


i=\Vk'\ + l 


noo 1 

< 1 + Vk + 2k —^dx = sVk -|- 1 = 0{Vk) 

JVk 


( 11 ) 


Now, let’s consider the event Yi > i. This event implies that number of items of among the 
k highest valued items is at least k/2 + i. Again, using Lemma [T7I we have: 


Pr [Ui > i] < Pr [\Tk n 5"| >k/2 + i]<^ (12) 

and hence we have 


CX) 

\Vk-\ °° U k ^ 

E + E ? 


^Pr[yi>f] 


2=1 




k 1 3 ^ 

<1-1- Vk + - -^dx = -Vk -1- 1 = 0{Vk) 

2 x^ 2 

(13) 

which completes the proof. 


□ 


Now we start proving the theorem. Basically, we prove for any fixed k there exists a function 
€{k,q,6), non-increasing w.r.t. q, such that ALG{U,k,q) is ^1 — O(^) — e(A:, (5)^-competitive, 

and e goes to 0 as g —>■ oo and <5 —)• 0 for a fixed k. First without loss of generality we modify 
values so that if the value is among k highest it remains the same, otherwise it is set to 0. This 
doesn’t change sum of the values of the k highest items, and just weakly decreases the values of 
items picked by any algorithm. Now, run the algorithm with modified values. Let A be the set 
of items picked by ALG(2/,A:,g) and O be the subset of k highest value items under a. Define 
S = {x ^ U ■. 7r(x) < to be the set sampled before threshold and = hl\S be its 

complement. Suppose vq is the value of the Y highest valued item in S (if 151 < |, set vq = 0). 
Moreover, define the value function val(.) to be the sum of values of the input set of items under a. 
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Fixing a, we prove the claim by induction over k. The case A: = 1 is exactly the case of a single 
secretary, which is analyzed in ^2.11 For general k, we first run ALG{U H S,k/2,T{q)) to give us 
^ n S'. Note that the ordering of arrivals of items in S satisfies (p, r(g), (5)-BIP. So, by induction 
hypothesis and conditioned on set S we have 


E[val(^n5)|S] > E 


val{[OnS]k/2)[l-0{^)-e{k/2,T,d))\S 


We can lower-bound the right-hand side further as follows. 


E 


val([0 n S]k/ 2 ) ( 1 - O(^) - e{k/2, r, <5) ) 

1 


> 


E [val([On5]fc/2)|5] - val(O) 0(—) + E [e(A/2, r, ,5)|5] , 

ks 


and by taking expectation with respect to S we have 


(14) 


(15) 


E [val(^ n 5)] > E [val([C> n S]k/ 2 )] - val(O) ( O(^) + E [e(A:/2, r, 5)] 

ks 


(16) 


Now suppose I{.} is the indicator function. One can easily decompose e{k/2,T,5) as follows. 

e{k/2,T,6) = €{k/2,T,5)I{T > g/4} -|- €{k/2,T,5)I{T < g/4} < e{k/2,q/4:,5) -|-I{r < q/^] (17) 

where the last inequality is true because e{k/2,T,6) is non-increasing w.r.t. r. Now by taking 
expectations from both hand sides of (fT71) . we have: 

E [e{k/2, r, 5)] < e{k/2, q/4,6) + Pr [r < q/4] 

= e{k/2, q/4,5) + Pr [r < g/2(l - 1/2)] < e(A:/2, q/4,5) + e"* (18) 

where in the last inequality we used Chernoff bound, as r is drawn from Binom(q, 1/2). Now, fix 
e'. For a given e' we have 

E [val([0 n S]k/ 2 )] = E [val(0 O S)I{\0 O < A;/2}] + E [val([0 O S]k/2)I{\0 O 5| > A;/2}] 

k/2 


> E [val(0 n 5)I{|C> n 5| < A;/2}] + E 

> E [val(C> n S)I{\0 n 5| < A:/2}] + 


jon^l 


l+£' 


:E 


val(C>n5)I{|C>n5| > A:/2} 


k 


val(C> n S)I \ -(1 + s') > |0 n > k/2 


(19) 


Also, we have 
1 


1 + e' 


-E 


val(0 n 5)1 <J -(1 + s') > |C> n 5| > k/2 


1 


> 


l + s‘ 

1 


jB [val(0 n 5)I{|C> n 5| > A:/2}] 


1 + e' 


-E 


val(0 n 5)1 <1 |C> n 5| > -(1 + e') 


1 + e' 
(1) 1 


-E [val(0 n 5)I{|C> n 5| > A:/2}] - val(0)Pr 


|On5|>-(l+e') 


> 


-E[val(On5)I{|C>n5| > A:/2}]-^val(O) 

1 ~t“ ^ 


( 20 ) 
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in which (1) is true because of Lemma [T71 Combining (|19p with (I20p we have: 

E[val([OnS]fc/ 2 )] > ^E[val(0n5)]-^val(0) >E[val(0n5)]-(e' + ^)val(0). (21) 
Finally, by combining (fTHP . (fT^ and (f2T|) and setting e' = ^, we have 

E [val(^ n 5)] > E [val(0 n S)] - val(C>) ( O(^) + + e{k/2, q/A, 6)] (22) 

Next, we try to lower-bound E [val(^ n S^)] by E [val(0 n 5*^)]. Lets define random variable 
Q = \A n 5*^1 to be number of items algorithm picked from S’^. We have E[val(^n5^)] = 
Ylt=o ^ [val(^ n S’'^)! {Q = x}]. Now we look at each term E [val(^ n 5*^)1 {Q = x}], and we try to 
lower-bound it with E [val(C> n 5*^)1 {Q = x}] for different values of x. Consider two cases: 


Case 1, when x < -I : In this case xq > 0 and all items in S with value more than vq are in O. 
We know the number of items in that have value at least xq is x. If we look at items in O Ci 
all items in ^ n are also in O n and in addition we have at most k — {k/2 + x) = k/2 — x 
items in O C all of which have value at most vq- Hence, as the value of any item in ^ n is 
at least vq, the followings hold deterministically : 


val(C> n S^) — val(^ C S^) = val({x € O (1 < xq}) 

<- \Ans^\ -^ ^ “ i)vai(^n s ) 

which implies val(^ n S'^) > ^val(C> n S"^) when x < k/2. So for x < k/2, 


E[val(^nS")I{Q = x}] > E 


2Q 


vai(c>ns)i{g = x} 


(23) 


(24) 


Case 2, when x = |: In this case either vq > 0, which implies at least there are k/2 items in On S. 
As algorithm also picks k/2 items and so A n = O n for which we are done. Otherwise, 
suppose Vo = 0. We know the permutation distribution generating tt satishes the (p, q, J)-BIP some 
p> k, and hence it satisfies {k, q, h)-BIP. So, based on Theorem[2]it also satisfies {k, 6 + y)-UIOP. 
Roughly speaking, if you look at any subset of elements with cardinality at most k, their induced 
ordering is almost uniformly distributed (within an error of h -|- ^). We know in this case algorithm 

picks I items (all of them in O n S''^), and in fact it picks the first | elements of O n in the 
ordering of elements in O n induced by the permutation vr. Suppose X = \0 (1 S"^| — k/2. Then 


E 


val(A n S^)l {Q = ^ 


> E 

= E 

- E 

> E 


val(first k/2 elements of O n in the ordering vr)! { Q — 2 


k 


val(C>nS")I<{ Q = - 


val(last X elements of O n in the ordering vr)! i Q = 


val(C>nS")I<^ Q = 


k 


- E [val(last X elements of O H 5^ in the ordering tt)] 


(25) 
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For a fixed set S, we have {k, 6 + —)-UIOP for elements in O n 5^^ (this is an order oblivious 
fact), and hence the induced ordering of the elements in O n is almost uniform. So, we have 


/j2 

E [val(last X elements of O n 5^^ in the ordering 7r)|5] < (1 + (5 H-)E 


val(0 n 


X 


\ons^ 


-\s 


<{1 + 5 + —)val(C>)E 


|On5^| -k/2 


\S 


\ons^\ 

Now by taking expectation w.r.t. S and combining it with (I25p we have 


(26) 


E 


val(^ n 5^)1 Q = 


k 


> E 


val(C> n 5^)1 Q = 


k 


A-2 

-(l + <5 + —)val(C>)E 


|C>n5"| - k/2 

ion 5^ 


(27) 

Moreover, one can use Lemma [IT] to find an upper-bound on the error term in (I26p . Fix any e', 
Now we have 

ion 51 - k /2 _ |0 n 51 -1/2, ^ ^ j ^ l°"JI-//h {|0 n 51 > k /2 + 


|On5"| lon^^l ' J ' \onS' 

By taking expectation from both sides of (IT5|) . setting e' = ki and using Lemma [I7| we have 
■|On5^| - k/2 


E 


\or]S‘ 

By combining (1290 and (1261) we have (note that <5 < 1) 

'2Q 


S r. I 1 I /I ^ k 3 


(28) 

(29) 


E[val(Mn5")I{Q = k/2}] > E 


k 


val(On5")lI{Q = k/2} 


, 6 3k3 

- val(0)(^ +- 

fcs Q 


(30) 


As we desired. 

Now, by combining the above cases with each other we have 


E [val(M n S^)] > E 


k 


val(0 n S" 


1/^^N / 6 Sks 

— Ydl{0){— ^ ) 

k3 q 


/ fi 3 

> E [val(0 n S^)] - val(O) — +-+ E 


fcs 


k/2-Q 


k/2 


(31) 


Finally, by combining equations (12211 and (I3ip we have 


[ 1 - E 

'k/2 - Q' 

1 

[ k/2 \ 


- 0(4) - — - + e{k/2, g/4,5) | (32) 

ki q ) 


As it can be seen, the question of finding the competitive ratio of 1 — o(l) boils down to upper- 
bounding E [k/2 — Q]. To do so, we define random variable Yi such that (A;/2)*’^ item in sorted- 
by-value list of items in S will be the {k + Fi)*^ item in hi. Now we claim that Q > k/2 — lYil. 
The proof is easy. If uq = 0 then algorithm picks k/2 items from and we are done. Otherwise, 
there are k + Yi — k/2 = A:/2-|-Fi items in 5^ such that their values is at least vq. By a simple case 
analysis, if Li <0 then algorithm picks all of those, and hence Q > k/2 + Yi = k/2 — \Yi\. IfYi > 0 
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then algorithm picks k/2 items which is > k/2 — |li| we are again down. So E [k/2 — Q] < E [|Ei|]. 


Lemma [THl shows that E [lEil] = 0{'\/k), and hence E 


k/2-Q 

kl2 


< 0{1/Vk). Hence, we have 


E [val(^)] > val(C>) f 1 — O(^) — — -e — e(A:/2, q/4, <5) 

\ ks Q ^ 

1 


> val(O) { 1 - 0(—) - e{k, q, 6) 
ka 


(33) 


which completes the proof, as e can be arbitrary small for large enough q and small enough 6 . 


B.4.2 Full proof of Theorem 1121 

We define a randomized construction that dehnes an input and a probability distribution simulta¬ 
neously. By the probabilistic method this implies the statement. 

Our bipartite graph has n vertices on the online and the offline side each. For each pair (j, i), we 

add the connecting edge with probability ^ — 8-^^ independently. In case j and i are connected, 

the edge weight is set to 'w{j,i) = 1 — e(j -|- i) for e = This way, the expected weight of the 
optimal solution is 0(n). 

To define the distribution over permutations Tii'.hl^ [n], we draw for the first ^ logn 

offline vertices i & R one permutation uniformly at random from the set of all permutations in 
which the neighbors of node i come last. Afterwards, we draw one of these permutations vri,..., vr^ 
at random. We claim that this way, the probability distribution fulfills the (k, (5)-uniform-induced- 
ordering property. 

Fix k distinct items xi,..., £U. Note that we can ignore the fact that in any permutation 

neighbors come last as all xi,...,Xfc have the same probability of corresponding to a neighbor. 
Therefore, we can steadily follow the argument from Theorem [6l Let = 1 if 7rj(xi) < 7rj(x2) < 
... < TTi{xk)- As TTi is drawn uniformly from the set of all permutations, we have Pr [Yi = 1] = ^. 

= If. As the random variables Yi are independent, we can apply a 

Chernoff bound. This gives us 


That is, we have E 


tef., r, 


Pr 



< exp 



= n 


k+l 


Note that if X)f=i < {1 — then the respective sequence xi, ■ ■ ■ ,Xk E W has probability at 
least (1 — (5)|f when drawing one permutation from vri,..., tt^. 

There are fewer than possible sequences. Therefore, applying a union bound, with probability 
at least 1 — | the bound is fulfilled for all sequences simultaneously and therefore S fulfills the stated 
condition. 

It now remains to show that the Korula-Pal algorithm has a poor performance on this type of 
instance. The algorithm draws a transition point r ~ Binom(n, |), before which it only observes 
the input and after which it starts a greedy allocation based on the vertices seen until round r and 
the current vertex. It is important to remark that for the tentative allocation the other vertices 
seen between round r and the current round are ignored. Only after a tentative edge has been 
selected, their allocation is taken into consideration in order to check whether the matching would 
still be feasible. 

Let TTj be the chosen permutation. That is, the neighbors of i come last. Let i have n — A 
neighbors. We now claim that with high probability no neighbor of i comes before r, i.e., A < t. 
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Furthermore, after r essentially only neighbors of i arrive. This has the consequence that almost 
all vertices are tentatively matched to i. However, only the first such edge is feasible. 

Using Chernoff bounds, we get 


^ r n ;-1 ^ n ( , /inn \ f 1 lnnn\ 1 

Pr r< — — 2vnlnn =Pr n — r>— 1 + A\ - < exp —--16-— < — . 

12 J 2\ \ n j \ 6 n 2 J n 


Pr ^ ^ < Pr i? > ("l + Q 


— SV n In n 


In case H < r, the value of the solution is upper-bounded by j because every node is tentatively 
matched to a vertex of index at most i. As i < r, this gives a value bounded by ^ Inn. 
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